Floating-Point Arithmetic on Round-to-Nearest Representations
نویسندگان
چکیده
Recently we introduced a class of number representations denoted RN-representations, allowing an un-biased rounding-to-nearest to take place by a simple truncation. In this paper we briefly review the binary fixed-point representation in an encoding which is essentially an ordinary 2’s complement representation with an appended round-bit. Not only is this rounding a constant time operation, so is also sign inversion, both of which are at best log-time operations on ordinary 2’s complement representations. Addition, multiplication and division is defined in such a way that rounding information can be carried along in a meaningful way, at minimal cost. Based on the fixed-point encoding we here define a floating point representation, and describe to some detail a possible implementation of a floating point arithmetic unit employing this representation, including also the directed roundings.
منابع مشابه
Dynamical Control of Computations Using the Family of Optimal Two-point Methods to Solve Nonlinear Equations
One of the considerable discussions for solving the nonlinear equations is to find the optimal iteration, and to use a proper termination criterion which is able to obtain a high accuracy for the numerical solution. In this paper, for a certain class of the family of optimal two-point methods, we propose a new scheme based on the stochastic arithmetic to find the optimal number of iterations in...
متن کاملFormal Methods Applied to a Floating-Point Number System
This report presents a formalisation of the IEEE standard for binary floating-point arithmetic in the set-theoretic specification language Z. The formal specification is refined into four sequential components which unpack the operands, perform the arithmetic, pack and round the result. This refinement follows proven rules and so demonstrates a mathematically rigorous method of program developm...
متن کاملA Constructive Criticism of the C/C++ Proposal for Complex Arithmetic
The IEEE 754 and 854 standards regulate the behaviour of real floating-point arithmetic, as implemented in most current hardand software systems. Although a myriad of libraries for complex floating-point arithmetic is available and in use, there is no general consensus on their implementation. The International C Standard describes in its Annex G guidelines for the implementation of complex ari...
متن کاملSharp ULP rounding error bound for the hypotenuse function
The hypotenuse function, z = √ x2 + y2, is sometimes included in math library packages. Assuming that it is being computed by a straightforward algorithm, in a binary floating point environment, with round to nearest rounding mode, a sharp roundoff error bound is derived, for arbitrary precision. For IEEE single precision, or higher, the bound implies that |z − z| < 1.222 ulp(z) and |z − z| < 1...
متن کاملError bounds on complex floating-point multiplication with an FMA
The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff u, we show that their bound √ 5u on the normwise relative error |ẑ/z − 1| of a complex product z c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1201.3914 شماره
صفحات -
تاریخ انتشار 2011